tensorflow的多GPU并行

  • A single host with one CPU;
  • A single host with multiple GPUs;
  • Multiple hosts with CPU or multiple GPUs;

见multi-gpu项目

TF gpu基本操作、参数

简单计算

实例二: cifar10

  1. 数据并行处理、梯度并行计算
  2. 梯度合成
  3. loss合成
    1.

average_gradients,不同GPU计算梯度合成。

核心代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# 为每个GPU分配不同batch的数据
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
loss = tower_loss(scope, image_batch, label_batch)

# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()

# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

# Calculate the gradients for the batch of data on this CIFAR tower.
grads = opt.compute_gradients(loss)

# Keep track of the gradients across all towers.
tower_grads.append(grads)

# 多个GPU的并行,
# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
# 同步是怎样做的?
grads = average_gradients(tower_grads)

疑问,什么时候进行的同步?

实例三: tensor2tensor的多卡并行

https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/devices.py#L61

数据分配、计算资源分配

参考